Intel® C/C++ Compiler Demos at CGDC98
Intel will be showing two compiler demos at CGDC98, both of which showcase the
latest Intel® C/C++ Compiler technology. These demos are the
TimeDCT Demo and the Intel JPEG Library Looping Demo. The TimeDCT Demo and complete source
code are available for download from this web page.
TimeDCT Demo Program
The TimeDCT demo program is a Windows* MFC application that demonstrates the
performance capabilities of the Intel C/C++ Compiler by focusing on MMX technology.
To do this, the demo calls an inverse Discrete Cosine Transform (iDCT) routine that is
coded in four different ways and displays the length of time required for execution of
each call.
The four methods of coding are:
- Standard C
- MMX technology assembly code
- Intrinsic functions for MMX technology (supported only by the Intel C/C++ Compiler)
- Ivec vector class library for MMX technology (supported only by the Intel C/C++
Compiler)
The C version of the algorithm, iDCT_AAN() in the aan_idct.cpp file, the MMX
instructions assembly version, MMX_iDCT8X8AAN() in the maan_idct.cpp file, and the timing code,
clear/start/stoptimer() in the timing.cpp file, are from the Intel JPEG Library.
This library contains hand-optimized MMX technology assembly and non-MMX technology C
routines to encode/decode JPEG images according to the type of processor. The library was developed
in the Intel Architecture Labs, and has
been highly tuned for Intel Pentium® II processors.
The intrinsics for MMX technology enable the user to code in C or C++, eliminating the
time required to hand-optimize assembly language, while offering nearly the same
performance. There are intrinsics for virtually all of the MMX instructions, but the
compiler allocates registers and schedules instructions. The
intrinsics version of the iDCT, MMXIntrin_iDCT8X8AAN()
in the iaan_idct.cpp file,
implements the same algorithm as the assembly version, but was coded in a fraction of the
time; it executes only about 7% slower than the assembly version.
The vector class for MMX technology abstracts the intrinsics into C++ classes using
overloading to enable developers to program in natural C++ without worrying about what
intrinsic function to use. C++, due to object construction and copying, can incur some
additional overhead beyond that seen with intrinsics, but even further reduces the time
spent programming for MMX technology, and (in the future) the Katmai New
Instructions. There are three different versions of the vector classes supported by the Intel
C/C++ Compiler, one for each of the three data types supported by MMX technology: char,
short, and int. The MMXIvec_IDCT8X8AAN() routine in the vaan_idct.cpp
file uses the short version for 16-bit integers, I16vec4.
TimeDCT Demo Download
To run the demo, you need Windows* 95* or Windows NT* 4.0 running on a processor that
supports MMX technology. Follow these steps:
- Click here to
download the TimeDCTDownload.zip file.
- Unzip the TimeDCTDownload.zip file.
- Execute the TimeDCT.exe in the Release directory.
Under the "Timing" menu, the demo allows you to select the different versions
of the DCT. The clock counts required to make 100,000 calls to the routine are displayed,
as well as the percentage of clock counts vs. the number of clock counts for the assembly
version. Included in the download is a Microsoft Visual C++* 5.0 project. Unfortunately,
you will need to use version 3.0 of the Intel C/C++ Compiler, which is not scheduled to go
to beta until later this summer. At CGDC98 we will be using an internal alpha
version.
The Intel JPEG Library Looping Demo
This demo can be seen only at CGDC98. The demo was originally developed by the Intel Architecture
Labs and uses the Intel JPEG Library. The
demo program demonstrates the difference in performance that the coding method makes for
the iDCT algorithm as applied to a complete application.
The program decodes 45 different JPEG images and displays them in a window along with
the time taken to do the decoding and display. Even though the iDCT routine makes up only
about 30% of the computation time in the whole program, the difference between using the
C version and the others is very noticeable. However, the difference between using the
assembly version and the intrinsics or Ivec versions is negligible. The coding time for
the intrinsics and Ivec versions, however, was much smaller than that for the hand-tuned
assembly version.
|